145 research outputs found
Detect Any Deepfakes: Segment Anything Meets Face Forgery Detection and Localization
The rapid advancements in computer vision have stimulated remarkable progress
in face forgery techniques, capturing the dedicated attention of researchers
committed to detecting forgeries and precisely localizing manipulated areas.
Nonetheless, with limited fine-grained pixel-wise supervision labels, deepfake
detection models perform unsatisfactorily on precise forgery detection and
localization. To address this challenge, we introduce the well-trained vision
segmentation foundation model, i.e., Segment Anything Model (SAM) in face
forgery detection and localization. Based on SAM, we propose the Detect Any
Deepfakes (DADF) framework with the Multiscale Adapter, which can capture
short- and long-range forgery contexts for efficient fine-tuning. Moreover, to
better identify forged traces and augment the model's sensitivity towards
forgery regions, Reconstruction Guided Attention (RGA) module is proposed. The
proposed framework seamlessly integrates end-to-end forgery localization and
detection optimization. Extensive experiments on three benchmark datasets
demonstrate the superiority of our approach for both forgery detection and
localization. The codes will be released soon at
https://github.com/laiyingxin2/DADF
Diagnosis and Treatment of Tracheal or Bronchuotracheal Adenoid Cystic Carcinoma
Background and objective Adenoid cystic carcinoma is primary bronchopulmonary carcinoma with low malignancy, and 43 patients treated in the past 50 years in our hospital were retrospectively studied. The aim of this study is to discuss the clinical symptoms, pathologic characteristic and therapeutic method of primary tracheal or bronchuotracheal adenoid cystic carcinoma. Methods This study summarized total 43 patients of primary tracheal or bronchus adenoid cystic carcinoma treated in our hospital from Jan. 1958 to Dec. 2007. Among them, 40 patients were treated by surgical resection, and 3 patients were treated by fiberoptic bronchoscope’s interventional treatment. Results The 1-yr, 3-yr, 5-yr survival rates of the 43 patients above were 100% (41/41), 89.5% (34/38), 87.1% (27/31), respectively. Conclusion Primary tracheal or bronchus adenoid cystic carcinoma are rare and low malignancy carcinoma. The clinical symptoms of them are not typical. The best treatment is early detection and taking measures of operation plus radiotherapy. The other palliative treatment is fiberoptic bronchoscope’s interventional treatment
Learning Meta Model for Zero- and Few-shot Face Anti-spoofing
Face anti-spoofing is crucial to the security of face recognition systems.
Most previous methods formulate face anti-spoofing as a supervised learning
problem to detect various predefined presentation attacks, which need large
scale training data to cover as many attacks as possible. However, the trained
model is easy to overfit several common attacks and is still vulnerable to
unseen attacks. To overcome this challenge, the detector should: 1) learn
discriminative features that can generalize to unseen spoofing types from
predefined presentation attacks; 2) quickly adapt to new spoofing types by
learning from both the predefined attacks and a few examples of the new
spoofing types. Therefore, we define face anti-spoofing as a zero- and few-shot
learning problem. In this paper, we propose a novel Adaptive Inner-update Meta
Face Anti-Spoofing (AIM-FAS) method to tackle this problem through
meta-learning. Specifically, AIM-FAS trains a meta-learner focusing on the task
of detecting unseen spoofing types by learning from predefined living and
spoofing faces and a few examples of new attacks. To assess the proposed
approach, we propose several benchmarks for zero- and few-shot FAS. Experiments
show its superior performances on the presented benchmarks to existing methods
in existing zero-shot FAS protocols.Comment: Accepted by AAAI202
GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning
The rapid advancement of photorealistic generators has reached a critical
juncture where the discrepancy between authentic and manipulated images is
increasingly indistinguishable. Thus, benchmarking and advancing techniques
detecting digital manipulation become an urgent issue. Although there have been
a number of publicly available face forgery datasets, the forgery faces are
mostly generated using GAN-based synthesis technology, which does not involve
the most recent technologies like diffusion. The diversity and quality of
images generated by diffusion models have been significantly improved and thus
a much more challenging face forgery dataset shall be used to evaluate SOTA
forgery detection literature. In this paper, we propose a large-scale, diverse,
and fine-grained high-fidelity dataset, namely GenFace, to facilitate the
advancement of deepfake detection, which contains a large number of forgery
faces generated by advanced generators such as the diffusion-based model and
more detailed labels about the manipulation approaches and adopted generators.
In addition to evaluating SOTA approaches on our benchmark, we design an
innovative cross appearance-edge learning (CAEL) detector to capture
multi-grained appearance and edge global representations, and detect
discriminative and general forgery traces. Moreover, we devise an
appearance-edge cross-attention (AECA) module to explore the various
integrations across two domains. Extensive experiment results and
visualizations show that our detection model outperforms the state of the arts
on different settings like cross-generator, cross-forgery, and cross-dataset
evaluations. Code and datasets will be available at
\url{https://github.com/Jenine-321/GenFac
PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer
Remote photoplethysmography (rPPG), which aims at measuring heart activities
and physiological signals from facial video without any contact, has great
potential in many applications (e.g., remote healthcare and affective
computing). Recent deep learning approaches focus on mining subtle rPPG clues
using convolutional neural networks with limited spatio-temporal receptive
fields, which neglect the long-range spatio-temporal perception and interaction
for rPPG modeling. In this paper, we propose the PhysFormer, an end-to-end
video transformer based architecture, to adaptively aggregate both local and
global spatio-temporal features for rPPG representation enhancement. As key
modules in PhysFormer, the temporal difference transformers first enhance the
quasi-periodic rPPG features with temporal difference guided global attention,
and then refine the local spatio-temporal representation against interference.
Furthermore, we also propose the label distribution learning and a curriculum
learning inspired dynamic constraint in frequency domain, which provide
elaborate supervisions for PhysFormer and alleviate overfitting. Comprehensive
experiments are performed on four benchmark datasets to show our superior
performance on both intra- and cross-dataset testings. One highlight is that,
unlike most transformer networks needed pretraining from large-scale datasets,
the proposed PhysFormer can be easily trained from scratch on rPPG datasets,
which makes it promising as a novel transformer baseline for the rPPG
community. The codes will be released at
https://github.com/ZitongYu/PhysFormer.Comment: Accepted by CVPR202
Hyperbolic Face Anti-Spoofing
Learning generalized face anti-spoofing (FAS) models against presentation
attacks is essential for the security of face recognition systems. Previous FAS
methods usually encourage models to extract discriminative features, of which
the distances within the same class (bonafide or attack) are pushed close while
those between bonafide and attack are pulled away. However, these methods are
designed based on Euclidean distance, which lacks generalization ability for
unseen attack detection due to poor hierarchy embedding ability. According to
the evidence that different spoofing attacks are intrinsically hierarchical, we
propose to learn richer hierarchical and discriminative spoofing cues in
hyperbolic space. Specifically, for unimodal FAS learning, the feature
embeddings are projected into the Poincar\'e ball, and then the hyperbolic
binary logistic regression layer is cascaded for classification. To further
improve generalization, we conduct hyperbolic contrastive learning for the
bonafide only while relaxing the constraints on diverse spoofing attacks. To
alleviate the vanishing gradient problem in hyperbolic space, a new feature
clipping method is proposed to enhance the training stability of hyperbolic
models. Besides, we further design a multimodal FAS framework with Euclidean
multimodal feature decomposition and hyperbolic multimodal feature fusion &
classification. Extensive experiments on three benchmark datasets (i.e., WMCA,
PADISI-Face, and SiW-M) with diverse attack types demonstrate that the proposed
method can bring significant improvement compared to the Euclidean baselines on
unseen attack detection. In addition, the proposed framework is also
generalized well on four benchmark datasets (i.e., MSU-MFSD, IDIAP
REPLAY-ATTACK, CASIA-FASD, and OULU-NPU) with a limited number of attack types
rPPG-MAE: Self-supervised Pre-training with Masked Autoencoders for Remote Physiological Measurement
Remote photoplethysmography (rPPG) is an important technique for perceiving
human vital signs, which has received extensive attention. For a long time,
researchers have focused on supervised methods that rely on large amounts of
labeled data. These methods are limited by the requirement for large amounts of
data and the difficulty of acquiring ground truth physiological signals. To
address these issues, several self-supervised methods based on contrastive
learning have been proposed. However, they focus on the contrastive learning
between samples, which neglect the inherent self-similar prior in physiological
signals and seem to have a limited ability to cope with noisy. In this paper, a
linear self-supervised reconstruction task was designed for extracting the
inherent self-similar prior in physiological signals. Besides, a specific
noise-insensitive strategy was explored for reducing the interference of motion
and illumination. The proposed framework in this paper, namely rPPG-MAE,
demonstrates excellent performance even on the challenging VIPL-HR dataset. We
also evaluate the proposed method on two public datasets, namely PURE and
UBFC-rPPG. The results show that our method not only outperforms existing
self-supervised methods but also exceeds the state-of-the-art (SOTA) supervised
methods. One important observation is that the quality of the dataset seems
more important than the size in self-supervised pre-training of rPPG. The
source code is released at https://github.com/linuxsino/rPPG-MAE
- …